4 research outputs found

    OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content

    Get PDF
    The OpenMinTeD platform aims to bring full text Open Access scholarly content from a wide range of providers together with Text and Data Mining (TDM) tools from various Natural Language Processing frameworks and TDM developers in an integrated environment. In this way, it supports users who want to mine scientific literature with easy access to relevant content and allows running scalable TDM workflows in the cloud

    Automatic summarising based on sentence extraction: A statistical approach

    No full text
    The present dissertation and project describes a system for automatic summarising of texts. Instead of generating abstracts, a hard NLP task of questionable effectiveness, the system tries to identify the most important sentences of the original text, thus producing an extract. The proposed, corpus-based and statistical approach exploits several heuristics to determine the summary-worthiness of sentences. It actually uses statistical appearances of words, word-pairs and noun phrases to calculate sentence weights and then extract the highest scoring sentences. The statistical model used in the scoring function is a slight variation of the Term-Frequency Inverse-Document-Frequency (TFIDF) term weighting formula. The results obtained by application of the system to 5 test texts, separately and cumulatively for the three afore-mentioned heuristics were subjectively judged for their quality. This evaluation showed that the noun phrases, as a separate method, produce the best extracts and ..
    corecore